README - wattenburger-job2-its/97

Contents

Summary

Samples OTUs OTU Total Count OTU Table Density
215 3136 3635209.0 0.10828191741813004

Output

Chimera Removal

This occurs de novo during clustering using the UCHIME algorithm and as reference-based on the OTU seed sequences against Unite version 7 database

Biom Table

Counts observed per sample as represented in the biom file (file1). This count is representative of quality filtered reads that were assigned per sample to OTU seed sequences.

Minimum Count Maximum Count Median Mean Standard Deviation
1.0 33490.0 16880.0 16907.9488372 5967.62496189

Taxonomy was assigned to the OTU sequences at an overall cutoff of 0.8%.

Taxonomy database - Unite version 7 database

OTU Sequences

The OTU sequences are available in FASTA format (file2) and aligned as newick tree (file3).

To build the tree, sequences were aligned using Clustalo [1] and FastTree2 [2] was used to generate the phylogenetic tree.

Methods

Reads were quality filtered with BBDuk2 [3] to remove adapter sequences and PhiX with matching kmer length of 31 bp at a hamming distance of 1. Reads shorter than 51 bp were discarded. Reads were merged using USEARCH [4] with a minimum length threshold of 175 bp and maximum error rate of 1%. Sequences were dereplicated (minimum sequence abundance of 2) and clustered using the distance-based, greedy clustering method of USEARCH [5] at 97% pairwise sequence identity among operational taxonomic unit (OTU) member sequences. De novo prediction of chimeric sequences was performed using USEARCH during clustering. Taxonomy was assigned to OTU sequences using BLAST [6] alignments followed by least common ancestor assignments across Unite version 7 database [7]. OTU seed sequences were filtered against Unite version 7 database [8] to identify chimeric OTUs using USEARCH.

References

  1. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7: 539
  2. Price MN, Dehal PS, Arkin AP. 2010. FastTree 2--approximately maximum-likelihood trees for large alignments. ed. A.F.Y. Poon. PLoS One 5: e9490
  3. Bushnell, B. (2014). BBMap: A Fast, Accurate, Splice-Aware Aligner. URL https://sourceforge.net/projects/bbmap/
  4. Edgar, RC (2010). Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461. doi: 10.1093/bioinformatics/btq461
  5. Edgar, RC (2013). UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods.
  6. Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., & Madden T.L. (2008) "BLAST+: architecture and applications." BMC Bioinformatics 10:421.
  7. Koljalg, Urmas, et al. Towards a unified paradigm for sequence-based identification of fungi. Molecular Ecology 22.21 (2013): 5271-5277.
  8. Koljalg, Urmas, et al. Towards a unified paradigm for sequence-based identification of fungi. Molecular Ecology 22.21 (2013): 5271-5277.

All Files

More files are available in relation to this analysis than are presented here. They can be accessed from the results directory and are organized by your experiment ID (wattenburger-job2-its/97):

wattenburger-job2-its/97/                                 # clustering pairwise identity threshold
    ├── blast
    │   ├── blast_hits.txt                  # raw blast hits per OTU seed seq
    │   ├── lca_assignments.txt             # raw lca results TSV from blast hits
    │   ├── OTU.biom                        # tax annotated biom (no metadata, no normalization)
    │   ├── OTU_tax.fasta                   # otu seqs with tax in FASTA header
    │   ├── OTU.txt                         # tab delimited otu table with taxonomy
    │   └── README.html                     # results report when annotation method is 'blast'
    ├── logs
    │   ├── cluster_sequences.log
    │   ├── fasttree.log
    │   └── uniques.log
    ├── OTU_aligned.fasta                   # multiple alignment file of otu seed seqs
    ├── OTU.fasta                           # otu seqs without taxonomy
    ├── OTU.tree                            # newick tree of multiple alignment
    ├── utax
    │   ├── OTU.biom                        # tax annotated biom (no metadata, no normalization)
    │   ├── OTU_tax.fasta                   # otu seqs with tax in FASTA header
    │   ├── OTU.txt                         # tab delimited otu table with taxonomy
    │   ├── README.html                     # results report when annotation method is 'utax'
    │   └── utax_hits.txt                   # raw UTAX hits per OTU seed seq
    ├── demux
    │   ├── *.fastq.count
    │   └── *.fastq
    ├── logs
    │   ├── quality_filtering_stats.txt
    │   └── *.count
    ├── merged_?.fasta                      # error corrected FASTA prior to clustering into OTU seqs
    ├── merged.fastq                        # all sample reads merged into single file with updated headers
    └── quality_filter
        └── *.fastq                         # files that should have been cleaned up!

Downloads

file1:
OTU.biom
file2:
OTU.fasta
file3:
OTU.tree
file4:
OTU.txt
Author: Joe Brown (joe.brown@pnnl.gov) | 2017-08-08